An Analysis of Radicals-based Features in Subjectivity Classification on Simplified Chinese Sentences

نویسندگان

  • Ge Xu
  • Chu-Ren Huang
چکیده

Chinese radicals are linguistic elements smaller than Chinese characters1. Normally, a radical is a semantic category and almost all characters contain radicals or are radicals themselves. In subjectivity classification on sentences, we can use radicals to represent characters, which reduce the scale of word space while keep the subjectivity information. In this paper, we manually labeled a character set to build a high-quality radical-character mapping, and then the mapping is used to generalize character-based features with radicals. In experiments, we at first evaluated the performance when directly generalizing characters with radicals, and then offer a hypothesis that can reduce noises. Experiments show that this approach based on our hypothesis can reduce feature space while keep or improve the performance, which is especially useful when the training samples are scarce. keyword: sentiment analysis, subjectivity classification, radical, Chinese character

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MHSubLex: Using Metaheuristic Methods for Subjectivity Classification of Microblogs

In Web 2.0, people are free to share their experiences, views, and opinions. One of the problems that arises in web 2.0 is the sentiment analysis of texts produced by users in outlets such as Twitter. One of main the tasks of sentiment analysis is subjectivity classification. Our aim is to classify the subjectivity of Tweets. To this end, we create subjectivity lexicons in which the words into ...

متن کامل

Predictive Features for Detecting Indefinite Polar Sentences

In recent years, text classification in sentiment analysis has mostly focused on two types of classification, the distinction between objective and subjective text, i.e. subjectivity detection, and the distinction between positive and negative subjective text, i.e. polarity classification. So far, there has been little work examining the distinction between definite polar subjectivity and indef...

متن کامل

Beyond Topicality: Finding Opinionated Chinese Documents

The availability of Web 2.0 technologies has made it easy for information users to express their own opinions and access other people’s opinions on the Web. We are interested in understanding how opinions expressed in one way by one group compare to opinions expressed in another way by another group, especially in a different language. We have done reasonably well at finding opinionated English...

متن کامل

Political Leaning Categorization by Exploring Subjectivities in Political Blogs

This paper addresses a relatively new text categorization problem: classifying a political blog as either ‘liberal’ or ‘conservative’, based on its political leaning. Instead of simply using “Bag of Words” features (BoW) as in previous work, we have explored subjectivity manifested in blogs and used subjectivity information thus found to help build political leaning classifiers. Specifically, o...

متن کامل

Classifying Attitude by Topic Aspect for English and Chinese Document Collections

Title: Classifying Attitude by Topic Aspect for English and Chinese Document Collections Yejun Wu, Doctor of Philosophy, 2008 Dissertation directed by: Professor Douglas W. Oard College of Information Studies & Institute for Advanced Computer Studies, UMCP The goal of this dissertation is to explore the design of tools to help users make sense of subjective information in English and Chinese by...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014